Symmetric active/active metadata service for high availability parallel file systems

نویسندگان

  • Xubin He
  • Li Ou
  • Christian Engelmann
  • Xin Chen
  • Stephen L. Scott
چکیده

High availability data storage systems are critical for many applications as research and business become more data driven. Since metadata management is essential to system availability, multiple metadata services are used to improve the availability of distributed storage systems. Past research has focused on the active/standby model, where each active service has at least one redundant idle backup. However, interruption of service and even some loss of service state may occur during a fail-over depending on the replication technique used. In addition, the replication overhead for multiple metadata services can be very high. The research in this paper targets the symmetric active/active replication model, which uses multiple redundant service nodes running in virtual synchrony. In this model, service node failures do not cause a fail-over to a backup and there is no disruption of service or loss of service state. A fast delivery protocol is further discussed to reduce the latency of the total order broadcast needed. The prototype implementation shows that metadata service high availability can be achieved with an acceptable performance trade-off using the symmetric active/active metadata service solution. © 2009 Elsevier Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Symmetric Active/active Metadata Service for Highly Available Cluster Storage Systems

In a typical distributed storage system, metadata is stored and managed by dedicated metadata servers. One way to improve the availability of distributed storage systems is to deploy multiple metadata servers. Past research focused on the active/standby model, where each active server has at least one redundant idle backup. However, interruption of service and loss of service state may occur du...

متن کامل

Active Suspension System in Parallel Hybrid Electric Vehicles

In previous studies, active suspension system in conventional powertrain systems was investigated. This paper presents the application of active suspension system in parallel hybrid electric vehicles as a novel idea. The main motivation for this study is investigation of the potential advantages of this application over the conventional one. For this purpose, a simultaneous simulation is develo...

متن کامل

Scalability of Replicated Metadata Services in Distributed File Systems

There has been considerable interest recently in the use of highly-available configuration management services based on the Paxos family of algorithms to address long-standing problems in the management of large-scale heterogeneous distributed systems. These problems include providing distributed locking services, determining group membership, electing a leader, managing configuration parameter...

متن کامل

Symmetric Active/Active High Availability for High-Performance Computing System Services

This work aims to pave the way for high availability in high-performance computing (HPC) by focusing on efficient redundancy strategies for head and service nodes. These nodes represent single points of failure and control for an entire HPC system as they render it inaccessible and unmanageable in case of a failure until repair. The presented approach introduces two distinct replication methods...

متن کامل

Lessons Learned in Deploying the World’s Largest Scale Lustre File System

The Spider system at the Oak Ridge National Laboratory’s Leadership Computing Facility (OLCF) is the world’s largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF’s diverse computational environment, the project had a number of ambitious goals. To support the workloads of the OLCF’s d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 69  شماره 

صفحات  -

تاریخ انتشار 2009